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Abstract 

Background: Streptococcus equi ssp. zooepidemicus (5. zooepidemicus) is an important patliogen causing swine 
streptococcosis in Cliina. Patliogenicity islands (PAIs) of 5. zooepidemicus liave been transferred among bacteria 
tlirougli liorizontal gene transfer (HGT) and play important roles in the adaptation and increased virulence of 
S. zooepidemicus. The present study used comparative genomics to examine the different pathogenicities of 
5. zooepidemicus. 

Results: Genome of 5. zooepidemicus ATCG35246 (Sz35246) comprises 2,167,264-bp of a single circular 
chromosome, with a GC content of 41.65%. Comparative genome analysis of Sz35246, S. zooepidemicus 1\/1GCS10565 
(Szl0565), Streptococcus equi. ssp. equi. 4047 (Se4047) and S. zooepidemicus H70 (Sz70) identified 320 Sz35246- 
specific genes, clustered into three toxin-antitoxin (TA) systems PAIs and one restriction modification system 
(RM system) PAL These four acquired PAIs encode proteins that may contribute to the overall pathogenic capacity 
and fitness of this bacterium to adapt to different hosts. Analysis of the in vivo and in vitro transcriptomes of this 
bacterium revealed differentially expressed PAI genes and non-PAl genes, suggesting that Sz35246 possess 
mechanisms for infecting animals and adapting to a wide range of host environments. Analysis of the genome 
identified potential Sz35246 virulence genes. Genes of the Fim III operon were presumed to be involved in 
breaking the host-restriction of Sz35246. 

Conclusion: Genome wide comparisons of Sz35246 with three other strains and transcriptome analysis revealed 
novel genes related to bacterial virulence and breaking the host-restriction. Four specific PAIs, which were judged 
to have been transferred into Sz35246 genome through HGT, were identified for the first time. Further analysis of 
the TA and RM systems in the PAIs will improve our understanding of the pathogenicity of this bacterium and 
could lead to the development of diagnostics and vaccines. 



Background PAIs contain mobile genetic elements (MGEs), which 
PAIs play important roles in the adaptation and in- were acquired by the bacteria through HGT, Bacterial 
creased virulence of pathogens. Bacterial PAI often en- genomes contain various types of MGEs, such as trans- 
code both effector molecules responsible for disease posons, plasmids, and bacteriophages. All of these ele- 
and secretion systems that deliver these effectors to host ments may be acquired by HGT. Many MGEs serve as 
cells [1,2]. PAIs are a distinct type of genomic island. shuttles for genes that are beneficial to bacteria during 

their proliferation in a host environment. Several MGEs 
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Pathogenic bacteria often make use of suicide mecha- 
nisms, in which the death of individual cells benefits 
the survival of the population. This mechanism is regu- 
lated by the toxin-antitoxin system (TA system), which 
is related to DNA replication, mRNA stability, protein 
synthesis, cell-wall biosynthesis and ATP synthesis [4]. 
The 8 antitoxin-C toxin system (e/C system) is a type II 
TA system. It is distributed over plasmids and chromo- 
somes of various pathogenic bacteria [5]. These sys- 
tems benefit the stability of the genomic island in the 
bacterial genome. 

S. zooepidemicus is the ancestor of Streptococcus equi 
ssp, equi {S. equi) and these two strains express many of 
the same proteins and virulence factors. However, unlike 
S, equi, which is host-restricted and only infects horses, 
S, zooepidemicus has no host preference. S, zooepidemicus 
is primarily an opportunistic pathogen infecting a wide 
variety of animal species, including important domestic 
species, which makes it a pathogen of veterinary concern. 
S, zooepidemicus causes mastitis in cows and mares, and 
is the most frequently isolated opportunistic pathogen 
of horses [6]. Occasionally, S, zooepidemicus can infect 
humans via zoonotic transmission from infected ani- 
mals and causes invasive infections in humans such as 
septicemia and meningitis [7,8]. In 1975, Sichuan prov- 
ince experienced an S, zooepidemicus outbreak that 
resulted in the death of 300,000 pigs and great eco- 
nomic losses. S, zooepidemicus is an important patho- 
gen of streptococcal diseases in swine [9,10] and it 
remains a great threat to Chinese swine breeding. In the 
present study, we used comparative genomic analyses 
between S, zooepidemicus ATCC35246 and other pub- 
Ushed S, zooepidemicus strains [11,12] to investigate the 
mechanisms underlying the differing pathogenicities of 
Streptococcus equi ssp. In particular, we tried to ascertain 
how S, zooepidemicus ATCC35246 is able to cause such 
a serious disease in pig. We determined the complete 
genome sequence of S, zooepidemicus ATCC35246 
(Sz35246), a virulent strain isolated from a dead pig in 
China. The complete genome sequence not only permit- 
ted detailed analysis of the phylogenic relationship be- 
tween species, but also provided insights into the 
biology and pathogenic capacity of this streptococcus. 

Results and discussion 

Genomic features and basic transcriptomic structure 

The 2,167,264-bp genome of Sz35246 comprises a single 
circular chromosome with a GC content of 41.65% 
(Additional file 1: Table SI & Figure 1) and the genome 
information have been reported previously [11]. The GC 
content is similar to that of Streptococcus equi subsp. 
zooepidemicus MGCS10565 (Szl0565) [12], Streptococ- 
cus equi subsp. equi 4047 (Se4047) and Streptococcus 
equi subsp. zooepidemicus H70 (Sz70) [13]. The genome 
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Figure 1 Circular representation of the S. zoopedemlcus 
ATCC35246 genome and comparative genome results. The ten 

circles (outer to inner) siiow tlie following. The outer three circles 
represent Sz35246 protein-encoding genes homologous to that of 
Szl0565, Sz70 and Se4047, respectively. The fourth circle represents 
the PAI in Sz35246 chromosome. The fifth circle shows the 
chromosome position scaled in kb from oriC. The sixth and seventh 
circles show the coding sequences on the plus and minus strands, 
respectively. All genes are color-coded based on the COG functional 
categories: cyan, information storage and processing; yellow, cellular 
processes and signaling; magenta, metabolism; and black, poorly 
characterized. The eighth circle shows rRNA in red and tRNA in blue. 
The ninth circle shows the GC content (in 1-kb windows).Values that 
are greater than and or less than the average (41.65%) are shown in 
green and red, respectively. The tenth circle shows the GC skew 
curve (10-kb window and 1-kb incremental shift). The values for plus 
and minus strands are shown in cobalt blue and purple, respectively. 



contains 2,087 protein-encoding genes, 57 tRNA genes, 
and five 5S-16S-23S rRNA operon gene clusters. Among 
the protein coding genes, 416 (19.93%) are predicted to 
encode conserved hypothetical proteins that are similar 
to proteins of unknown factions in other genomes, and 
137 hypothetical genes (6.56%) have no matches in the 
nr protein database (Additional file 1: Table SI). The 
remaining 1534 genes were assigned putative functions. 
Eighty-one genes were identified as mobile elements, in- 
cluding those encoding a competence protein, a phage 
associated protein, a conjugation protein, a transposase 
and a site-specific recombinase, suggesting that these 
elements are used to take up and incorporate foreign 
DNA and are involved in reconstructing the genome 
architecture. Furthermore, global transcriptome analysis 
of Sz35246 using RNA-seq confirmed that 2048 of the 
2,087 ORFs are expressed, but with different sequence 
coverages in vitro and in vivo (Additional file 1: Table SI). 
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Comparative gene expression analysis reveals that 252 
genes are upregulated and 142 genes are downregulated 
(Additional file 2: Table S2) by more than a 2-fold change 
in reads per kilo base per million (RPKM) values (p < 
0.001) in vivo. The upregulated genes include 67 hypothet- 
ical protein coding genes and 28 response regulator, tran- 
scription regulator genes and chaperone protein encoding 
genes, suggesting that the differential expression of these 
ORFs plays an important role in survival of Sz35246 
within the different host environments. 

Additionally, we found that some genes, including malA 
(SeseC_01626), malD (SeseC_01627), malE (SeseC_01633, 
SeseC_01622), mal¥ (SeseC_01624, SeseC_01630), malG 
(SeseC_01625) and malQ (SeseC_01617) were upregulated 
when Sz35246 infected mice. These genes are related to 
maltose transport and metabolism and utilization of carbo- 
hydrates, which is essential for the ability of pathogenic 
bacteria to cause disease. Group A Streptococcus (GAS) 
strains express malE on their surface, and the transcript 
levels of the malE gene were significantly increased during 
growth in human saliva compared to common medium. 
MalE may contribute to the ability of GAS to colonize the 
oropharynx by utilizing maltose [14]. In addition, studies in 
S. pneumoniae have shown that deletions in carbon me- 
tabolism genes, including the maltose operon, lead to 
decreased production of known virulence factors, such 
as capsular polysaccharide and cholera toxin [15]. MalE 
of Sz35246 is a maltodextrin-binding protein, which 
also binds longer maltodextrins (e.g., maltotriose and 
maltotetraose). The upregulation of this protein and 
other maltose utilization-related proteins may contribute 
to the infection of Sz35246. Further investigation into 
these carbohydrate transport and metabolism pathways 
genes may yield novel insights into the pathogenesis of 
Sz35246. We also observed that certain known virulence 
factors were upregulated during Sz35246 infection, for 
example, streptokinase (SeseC_02411) and fibronectin- 
binding protein {sfs, SeseC_00464). The upregulation of 
bacteriocin (SeseC_02042) could help Sz35246 compete 
with other bacteria that colonize the host. 

Comparative genomic analysis and pathogenicity islands 
(PAIS) 

Comparative analysis of Sz35246 genome with three other 
genomes revealed that the evolution of Sz35246 has 
been driven by genomic rearrangements and HGT. X- 
alignment analysis of Sz35246 versus Szl0565 [12], Se4047 
and Sz70 [13] revealed that small and large scale chromo- 
some inversions have occurred during replication termin- 
ation between Sz35246 and Se4047 and between Sz35246 
and Sz70 (Figure 2). These genome rearrangements may 
influence the transcription of surrounding genes after the 
HGT process, which has contributed to the shaping of the 
Sz35246 genome. 



The comparative analysis of the Sz35246 genome with 
the three other genomes identified 1,397 orthologous 
genes that are shared by all four strains (Figure 3). In 
addition, 191, 184 and 93 genes are shared between 
Sz35246 and Szl0565, Sz70 and Se4047, respectively, 
suggesting that Sz35246 and Szl0565 are more closely 
related than the other strains. Furthermore, X-alignment 
analysis of Sz35246 versus Szl0565 [12], Se4047 and 
Sz70 [13] also suggested that Sz35246 is closer to 
Szl0565 than to the other two species. Phylogenetic 
trees of the four strains were constructed based on the 
sequences of the 1,397 orthologous genes using mini- 
mum evolution and neighbor joining phylogenic re- 
construction methods available in the MEGA package 
(Figure 4). The phylogenic trees also indicated that 
Sz35246 is much closer to Szl0565 than to the other 
two species, which is consistent with the r genome-scale 
alignment analysis. 

Further comparative analysis showed that 320 genes 
are specific to Sz35246, which include 197 (61%) that 
were annotated as "hypothetical protein", among which 
149 encode small proteins with lengths of no more than 
100 amino acids (Additional file 3: Table S3). These 
small proteins are annotated as hypothetical proteins; 
however, certain highly conserved hypothetical proteins 
may play important roles in response to specific environ- 
mental stresses and host adaptation. For example, these 
small proteins have been reported to have evolved in re- 
sponse to specific environmental stress and to partici- 
pate in the suppression of the type III secretion system 
[16]. The remaining functional genes encode 40 virulence 
proteins, 14 phage-associated proteins, eight transposases, 
five site-specific recombinases, a conjugation protein, a 




Figure 3 The whole genome comparison among Sz35246 and 
Sz10565, Sz70, Se4047 genome. Venn diagram representing 
unique and sliared gene numbers among Sz35246, Szl0565, 
Sz70 and Se4047. 
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Figure 4 Phylogenetic trees were inferred using neighbour joining (A), maximum parsimony (B) and minimum evolution (C) among 
the four genomes based on concatenated alignments of 1397 orthologous proteins. 



phage integrase, a phage recombinase, an IS transposase 
and a relaxase. These results suggest that the Sz35246 
genome acquired these virulence genes through HGT, ei- 
ther by transduction with phages or by conjugation with 
plasmids or chromosomal fragments. 

Furthermore, these Sz35246-specific genes are tightly 
clustered into four regions, varying in length from about 10 
kb to 50 kb, which were as termed PAIs (SeseCisland_l~4) 
(Figure 1). An orthologous genes analysis between Sz35246 
and Szl0565, Sz70, Se4047 confirmed these genomic 
islands are present in the Sz35246 genome only (Figure 1). 
The genes located in these four PAIs might be involved in 
Sz35246s pathogenesis in causing swine streptococcosis 
and its strong virulence. These islands were further con- 
firmed the annotation information and the co-linearity 
comparison of the Sz35246 genome with those of the three 
other genomes. 

Significantly, sequence and annotation analyses of 
these islands revealed that SeseCisland_l, SeseCisland_2 
and SeseCisland_3 contain the same type of virulence 
genes involved in the bacterial TA systems that have 
been reported to play subtle roles in the survival of bac- 
teria under harsh natural environments [4, 17]. Based on 
previous analyses of TA systems in Escherichia coli K12 
[4], Mycobacterium tuberculosis [17] and Mycobacterium 
smegmatis [18], we speculated that acquired-TA systems 
might play a positive role in survival of Sz35246 under 
different host environments. The RM system is used by 
bacteria to protect themselves from foreign DNA, such 
as bacteriophages and other viruses. Genes encoding 



RM system proteins, which include a restriction endo- 
nuclease and a restriction endonuclease control protein, 
were identified in a cluster in SeseCisland_4- Based on 
these results, we speculated that the acquired RM system 
might be involved in defense against infection by foreign 
DNA such as prophages and viruses. Thus, the PAIs may 
allow Sz35246 to adapt to various host stress conditions 
and to defend itself against infection by prophages, other 
bacteria and viruses. The expression and potential impact 
of these islands on the physiology, pathogenesis and host 
adaptation of Sz35246 are discussed below. 

I. SeseCisland_1: Phd/Doc TA system 

SeseCisland_l contains 54 genes (53,095 bp), 42 of which 
are Sz35246-specific genes, including mobile elements 
resembling the IS200 family transposase (SeseC_00874), 
a prophage site-specific recombinase resolvase family 
protein (SeseC_00919), a putative conjugal transfer pro- 
tein (SeseC_00927), a conjugation protein (SeseC_00935) 
and a tnpX site-specific recombinase family protein 
(SeseC_00939), suggesting that this island is an integra- 
tive conjugative element (Additional file 4: Table S4 & 
Figure 5A). SeseCisland_l contains 20 structural phage 
loci, indicating that MGEs, such as phages, are also im- 
plicated in HGT in Streptococcus species. Further ana- 
lysis showed that the island has an abnormal GC skew 
and that the island-located genes have an average G+C 
content of 39.42% (Additional file 4: Table S4), which is 
significantly different from the mean value for the gen- 
ome (41.65%) (p=0.002). 
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Figure 5 Schematic representation of the putative PAIs and their expression in S. zoopedemicus ATCC35246 (A), SeseCisland_1; (B), 
SeseCisland_2; (C), SeseCisland_3; (D), SeseCisland_4. The expression levels of in vitro and in vivo conditions are shown at single-nt 
resolution in red and green, respectively. All genes are color-coded based on the annotation information as follows: yellow, toxin-Antitoxin 
system (TA system); blue, phage associated protein; green, other virulence protein; red, virB4 components; cyan, mobile elements; grey, 
hypothetical protein. 



A major feature within SeseCisland_l is the presence of 
two genes (SeseC_00898 and SeseC_00899) encoding pro- 
teins homologous to addiction module antitoxin Phd pro- 
tein and killer Doc protein, respectively. The Phd-Doc TA 
system is the Type II TA system that first identified in bac- 
teriophage PI [19]. Phylogenetic trees based on Phd/Doc 
proteins also suggested that Phd/Doc proteins are highly 
homologous to those of Streptococcus phage phi-m461 
and phi-SsuDl (Figure 6). The Phd/Doc mRNAs are co- 
expressed from the same promoter and both are translated 
into proteins. These proteins form a stable TA complex to 
block the functions of the Doc toxin. Doc is a toxin that 
kills plasmid-free segregants, and Phd is an unstable anti- 
dote that neutralizes the toxin. Doc inhibits translation 



elongation by association with the SOS ribosomal subunit 
[20]. Under stress conditions or host change, Phd is de- 
graded by ATP-dependent serine proteases, such as ClpXP 
protease [21], resulting in freeing of the toxin from the TA 
system and inducing cell growth inhibition and cell death 
[22,23]. Interestingly, a gene (SeseC_00903) encoding a 
protein homologous to the E. coli Clpxp protein is present 
in this island. Doc toxins are expressed in two conditions 
(in vivo and in vitro) (Figure 5A), which agrees with previ- 
ous reports that Doc causes cell growth and death by 
inhibiting translation without affecting transcription and 
replication. The observations and results reported here 
support the hypothesis that SeseCisland_l helps Sz35246 
to adapt to environmental and host changes. 
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Figure 6 Phylogeny of antitoxin Phid and addiction module l<iller Doc protein sequences. Thirty phd toxin sequences (A) and twenty-six 
anti-toxin sequences (B) from loci of bacterial chromosomes, plasmids and phages were aligned. The sequences were aligned with clustalW and 
the genetic relationships inferred using the unweighted pair group method with arithmetic mean (UPGMA) implanted in MEGA 4.0 software. 



II. SeseCisland_2: Fic/Doc TA system 

SeseCisland_2 contains an important open reading 
frame (ORF), SeseC_01334, which encodes a protein 
with a filamentation induced by cAMP (Fic) domain 
(Additional file 5: Table S5 and Figure 5B). The Fic do- 
main is classified together with a second family of se- 
quences, doc (death on curing), in the Pfam protein 
families database [24]. The Fic/Doc family protein se- 
quences are aligned against this protein present inside 
other bacteria. Interestingly, phylogenetic analysis re- 
vealed that the Fic/Doc protein is homologous to that of 
Fusobacterium nucleatum subsp fusiforme (Figure 7). 



Fic/Doc family proteins are known as members of a TA 
system, the functional sites are common to both families 
[25]. The Fic protein has been reported to be involved in 
cell division and synthesis of folate, indicating that 
the Fic protein and cAMP are involved in a regulatory 
mechanism of cell division via folate metabolism [26,27]. 
Fic family virulence proteins may be important in many 
bacterial pathogens. For example, the immunoglobulin- 
binding protein A (IbpA) of Histophilus somni contains a 
direct repeat of two Fic domains, and mutation of IbpA or 
just the fic domain of IbpA decreased the virulence of this 
bacteria. The Fic domain has been shown to covalently 
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Figure 7 Phylogeny of Fic/DOC domain protein sequences. Twelve protein sequences of bacteria were aligned using clustalW and the 
genetic relationsiiip trees were constructed witii MEGA4.0 software as well as Figure 6. 
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modify host Rho GTPases with AMP, which may explain 
how the Fic domain influences bacterial virulence [28]. 
Thus, the Fic family protein in SeseCisland_2 may be 
involved in the pathogenicity of Sz35246. 

SeseCisland_2 encodes 16 genes (17,293 bp), 10 of 
which are Sz35246-specific. SeseCisland_2 also contains 
certain mobile elements, including an endonuclease 
relaxase (SeseC_01323), a bacterial mobilization protein 
(SeseC_01324) integrase/recombinase (SeseC_01328) 
and transposase protein (SeseC_01332). Thus, we 
speculate that this region also plays important roles in 
bacterial adaptation, virulence and physiology. 

III. SeseCisland_3: £/C TA system 

SeseCisland_3 contains 21 Sz35246-specific genes 
(Additional file 6: Table S6 and Figure 5C), the most 
notable of which are two genes annotated as Type II TA 
system genes encoding a ( toxin protein (SeseC_01875) and 
an 8 antitoxin protein (SeseC_01876). VirB4/VirB6/VirD4 
components (SeseC_01908, SeseC_01912 SeseC_01914 and 
SeseC_01916) from the type IV secretion system (T4SS) 
are also present. Additionally, virulence-associated fac- 
tors, such as glucan-binding protein and abortive infec- 
tion protein, are also encoded by this region. All the 
virulence genes are expressed under in vitro and in vivo 
conditions. Several MGEs such as site-specific recom- 
binases (SeseC_01863, SeseC_01864 and SeseC_01865) 
and transposases (SeseC_01867&SeseC_001869) were 
also identified in this island. The bioinformatics analysis 
showed that a Type II TA system, a type IV secretion 
system and other virulence genes were present in this 
island, which may contribute directly to the bacterium's 
pathogenicity and host adaption. 

e/C systems ensure stable plasmid inheritance by indu- 
cing death in plasmid-deprived offspring cells. Members 
of the e/C systems have been found on resistance plas- 
mids in major human pathogens [29,30]. By contrast, 
chromosomally encoded 8/( systems were reported to 
contribute to virulence of pathogenic bacteria. Brown 
et al compared clinical serotype 3 isolates with ( toxin 
gene knockout strains in mixed systemic and respiratory 
infections of mice, and thus connected the ( toxin with 
virulence in Streptococcus pneumonia [31]. The e/C sys- 
tem also exists in the 89 k pathogenicity island of 
Streptococcus suis serotype 2. This bacterium is an im- 
portant zoonotic pathogen, causing more than 200 cases 
of severe human infection worldwide [32]. The ( toxin is 
inhibited by its cognate antitoxin, e. The structure of the 
complex of ( toxin inactivated by 8 antitoxin (82C2) was 
solved by X-ray crystallography [33]. Upon degradation 
of 8, the C toxin is released, allowing this enzyme to in- 
hibit bacterial cell wall synthesis, which eventually trig- 
gers autolysis [34]. The toxic effect of the ( toxin has 



also been demonstrated in a diverse array of organisms, 
including Saccharomyces cerevisiae [35]. 

Phylogenetic analysis of ( proteins and 8 antitoxin pro- 
teins showed that the proteins from Sz35246 are highly 
homologous to those of Streptococcus urinalis 2285-97, 
Streptococcus intermedius F0395 and Streptococcus 
vestibularis F0396 and widely distributed in many bacteria 
(Figure 8). This broad distribution has been reported that 
the zeta toxin family on plasmids [21,36,37], bacterial chro- 
mosomes [23,38] and in Streptococcus pneumonia and 
Streptococcus suis serotype 2 PAIs. The broad distribution 
of this system within the bacterial kingdom suggests that it 
uses a ubiquitous bacteriotoxic mechanism to overcome 
host defenses and environmental changes. On the other 
hand, we hypothesized that horizontal transfer of this 
island may occur through T4SS-mediated conjugation 
process, because four genes products display similarities to 
Streptococcus T4SS components. 

IV SeseClsland_4: RM system and virulence island 

SeseCisland_4 contains eight Sz35246-specific genes 
(from a total of 10 genes), the two mobile elements 
(SeseC_02358, SeseC_02362) are transposase IS1167 
and phage integrase (Additional file 7: Table S7 and 
Figure 5D), suggesting that this island has been ac- 
quired by HGT from another microorganism. The major 
feature of this island is three strain specific genes 
(SeseC_02360, SeseC_02361 and SeseC_02362) that were 
annotated as RM system proteins, which protect bacteria 
from foreign DNA, such as bacteriophages. The RM sys- 
tem is strategy that permits bacteria to live in difference 
environments [39], allowing bacteria erect a barrier to 
gene transfer and making them resistant to phage infec- 
tion [40]. Taken together, these data suggest that the RM 
systems is a remarkable characteristic of Sz35246 and is 
probably involved in the adaptation of these bacteria to 
different environmental conditions. 

Relationship between PAIs and Sz35246 virulence 

To prove that the genes located within the PAIs affect the 
virulence of Sz35246, we deleted part of SeseCisland_3 
from SeseC_01869 to SeseC_01898. PGR was used to 
confirm the deletion (Figure 9A and Additional file 8: 
Table S8), sequencing results showed that exactly 28,606 bp 
of SeseCisland_3 was deleted, including the genes belong to 
the 8 /( TA system (Figure 9B). The deleted region started 
with Tn5252 transposon gene (SeseC_01869), and two re- 
peat sequences, including transposase genes (SeseC_01867 
and SeseC_01901), were located at the flank of the deleted 
region. These two repeat sequences and the Tn5252 trans- 
poson gene formed the structural basis for deleting such a 
long fragment. The mutant strain AIsland3-Sz35246 and 
wild-type Sz35246, were used to infect ICR mice to evaluate 
the influence of partial PAI deletion on bacterial virulence. 
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Figure 8 Phylogeny of zeta toxin (PezT) (A) and epsilon antitoxin (PezA) (B) protein sequences. Eighty-eight zeta toxin protein sequences 
(A) and seventy-two epsilon antitoxin protein sequences (B) from Streptococcus, Enterococcus, Oribacterium, Veillonella and Gardnerella bacterial 
species were aligned. The genetic relationships were determined as detailed in Figure 6. 

J 



The percent survival significantly increased in AlslandS- 
Sz35246 infected mice (Figure 10), 5 days post-infection, 
only one of the ten mice was dead; however, of the mice 
infected with wild-type Sz35246, only one was alive. The 
survival curve indicated that partial deletion of a PAI did 
affect the virulence of Sz35246, and that some of these 
genes in the PAI are important for bacterial pathogenicity. 
Genes located in the other three PAIs require further study 
to determine their role in bacterial virulence. 

Other potential virulence genes dispersed in the Sz35246 
genome 

Strain Sz70 was isolated from a nasal swab taken from a 
healthy thoroughbred racehorse in Newmarket, England, 
in 2000 [41]. A genome wide comparison of Sz35246 
with Sz70 identified Sz35246-specific genes, some of 
which may be involved in the virulence of Sz35246 and 
may provide clues as to why Sz35246 causes such a ser- 
ious swine streptococcosis but other strains do not. 
Virulence-associated protein E (vapE, SeseC_01325), 
which was originally identified in Dichelobacter nodosus 
[42], is part of a vap region of D, nodosus that is associ- 
ated with virulence [43]. The mechanism by which VapE 
affects virulence has not been determined yet, but the 
presence of an integrase gene XerC (SeseC_01328) im- 
mediately upstream of vapE, suggested a role for bacte- 
riophages in the evolution and transfer of these bacterial 
virulence determinants; i.e., it is possible that exchange 



of this putative virulence factor with other bacterio- 
phages could take place [44] . Moreover, a vapE-like gene 
has also been identified in a pathogenicity island of 
Staphylococcus aureus [45]. The pathogens of a footrot 
outbreak in a Debre Zeit swine farm were identified as 
Staphylococcus aureus and Dichelobacter nodosus^ both 
bacteria contain the vapE gene. VapE has not been iden- 
tified in other strains of S, zooepidemicus, but only in 
Sz35246. This gene may be related to Sz35246 pathogen- 
icity towards pig. The role of the vapE gene in the viru- 
lence of Sz35246 remains to be clarified. 

Adherence is an essential requirement for invasion of 
cells by bacterial pathogens. Long extracellular struc- 
tures resembUng fimbriae mediate adhesion to compo- 
nents of the host extracellular matrix, such as collagen 
and fibronectin. We identified seven Sz35247 unique 
proteins that contain an LPXTG motif (found in cell 
wall anchor domains), including collagen-like protein 
SclZ.l (SeseC_00092), fibrinogen- and Ig-binding pro- 
tein precursor (SeseC_00180), cell surface protein 
(SeseC_00619), T-antigen-like fimbrial structural sub- 
unit protein (SeseC_02472), putative cell surface pro- 
tein (SeseC_02304), InlA-like domain containing cell 
surfaced-anchored protein (SeseC_01462) and collagen- 
like surface-anchored protein SclE (SeseC_00246). All 
of these proteins are anchored on the bacterial sur- 
face and may be involved in bacterial adhesion and 
invasion. 
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Figure 9 PGR detection and schematic diagram of partial SeseCisland_3 deletion. (A) PCR detection of genes located in SeseCisland_3 
deleted region (markers are 2000 bp, 1000 bp, 750 bp, 500 bp, 250 bp and 100 bp), the number of genes corresponds to Additional file 8: Table 
S8. (B) Genomic organization of the partial SeseCisland_3 deletion locus and its flanking repeat sequences in Sz35246. The fragment from 
1551608 to 1580213 was knocked out, including the zeta toxin (PezT) and epsilon antitoxin (PezA) genes. The deleted region started with a 
Tn5252 transposon protein gene and is flanked by two repeated regions. After the reciprocal recombination, only one repeat region remained. 
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Figure 10 Survival curves for ICR mice infected with the wild- 
type Sz35246 and Alsland3-Sz35246. Two groups of eight-week- 
old ICR mice were inoculated i.p. with 2.5x10^ CPU bacteria, and 
mouse survival was monitored over a 5-day period. Data are 
expressed as the mean percentage of live animals in each group 
(n = 10). The virulences of these two strains were significantly 
different (P<0.05). 



Fibronectin (Fn) -binding proteins have been reported 
to mediate the invasion of host cells without the need 
for other bacterial factors [46]. Fn, which has received 
much attention as a target of bacterial adhesins, it is a 
glycoprotein found in the extracellular matrix and body 
fluids of vertebrates. Fn-binding proteins are found in 
Streptococcus pyogenes (Sfbl/Fl), Staphylococcus aureus 
(FnBPA and FnBPB), Streptococcus dysgalactiae (FnBA 
and FnBB), and other bacterial species [47]. In previous 
research, 2infnz gene was found in S, zooepidemicus and 
a sfs gene was only found in S, equi, both of which genes 
encode a cell surface protein that binds Fn [48]. The sfs 
gene (SeseC_00464) was identified in Sz35246 for the 
first time. The transcriptome data showed that the sfs 
gene was upregulated infection of a host by Sz35246 
(in vivo). Presumably, the expression of this gene pro- 
moted bacterial pathogenicity by inhibiting the binding 
between collagen and Fn. 

The Sz35246 and Szl0565 genomes both have the Fim 
III operon (type II fimbriae) (SeseC_0247 1-02473 and 
SeseC_02475). The structural proteins of type III 



Ma et al. BMC Genomics 2013, 14:377 
httpy/www.biomedcentral.com/l 471 -21 64/1 4/377 



Page 11 of 14 



fimbriae have an amino-terminal secretion signal and a 
carboxy-terminal sorting signal, and their assembly into 
fimbriae is dependent on the adjacently encoded dedi- 
cated sortases [49]. Sz70 contains two loci that encode 
genes putatively required for pilus expression, but lacks 
this putative pilus locus. Recent studies of Salmonella 
enterica revealed that the presence of fimbriae increases 
the ability of host-restricted bacteria to invade normally 
restrictive cells [50]. Thus, we hypothesize that the pres- 
ence of the Fim III operon might be associated with 
breaking host-restriction by S, zooepidemicus. 

Conclusions 

The genome and expression analysis of Sz35246 provided 
fundamental information on the physiology and potential 
pathogenic capacity of this bacterium. The comparison of 
the genomes of Sz35246, Se4047, Szl0565 and Sz70 iden- 
tified gens that are specific to Sz35246. These genes may 
be related to the bacterium's pathogenic function, includ- 
ing causing swine streptococcosis and breaking host- 
restriction. We identified novel MGEs, which may have 
been involved in the evolution of Sz35246. The presence 
of the elements and the phylogenetic analysis indicated 
that this genome has been shaped by chromosomal inver- 
sion, recombination and HGT events. Sz35246 probably 
acquired its PAIs and certain specific genes through HGT. 
The presence of TA systems exists in three of genomic 
islands of Sz35246 may be related to this strains patho- 
genicity. Study of these systems will form the basis of our 
future research. The availability of the complete Sz35246 
genome sequence will facilitate further studies of this 
pathogen and the development of diagnostics and 
vaccines. 

Methods 

Strain and growth conditions 

S, zooepidemicus ATCC35246 was isolated from a dead 
pig in Sichuan, China.. To prepare total cellular DNA 
from S, zooepidemicus ATCC35246, bacteria were grown 
in Bacto™ Todd-Hewitt Broth at 37°C, in a 10% CO2 
atmosphere. Total cellular DNA was isolated from the 
mid-exponential (ODeoo^ 0-6) phase culture using a 
Genomic Purification System (Promega). 

Preparation of RNA for transcriptome analysis 
From pure culture 

cultures for preparing RNA samples were grown over- 
night at 37°C under aerobic conditions in liquid medium 
with shaking. Overnight pre-cultures were diluted in 
liquid medium and incubated at 37°C under aerobic 
conditions with shaking. Exponentially growing cells 
(OD6oo= 0-6) were harvested by centrifugation for 10 
min at 10,000 rpm at 4°C. Total RNA was extracted as 
previously described [51]. RNA quality was assessed by 



determining the OD260/280 ratio with a Nanodrop 2000 
(Thermo) and by visualization following agarose gel 
electrophoresis. 

From infected mice organs 

specific pathogen-free female BALB/c mice were intra- 
venously infected with S, zooepidemicus ATCC35246 
[52]. At 24 h post-infection, the mice were sacrificed and 
dissected. The livers and spleens were harvested and im- 
mediately frozen in liquid nitrogen. The organs were 
stored at -80°C. Before RNA isolation, the organs were 
thawed on ice and homogenized in 20 ml of an ice-cold 
solution composed of 0.2 M sucrose/0.01% SDS. The 
homogenate was gently centrifuged for 20 min at 300 
rpm and filtered to remove large tissue debris. The tis- 
sue suspension was centrifuged for 20 min at 8000 rpm 
to pellet the bacteria. Centrifugations were performed at 
4°C. Bacterial RNA extraction was performed as pre- 
viously described [51]. RNA quality was assessed by de- 
termining the OD260/280 ratio with a Nanodrop 2000 
(Thermo) and by visualization following agarose gel 
electrophoresis. 

Genome sequencing and annotation 

Whole-genome sequencing was performed with the Roche 
454 genome sequencer FLX system and assembled with 
Newbler (version 2.0.01.14). The detailed sequencing and 
assembly methods have been described previously [52]. 
The complete genome sequence of S, zooepidemicus strain 
ATCC35246 has been deposited in the GenBank database 
with the accession number CP00290. The replication ori- 
gin {oriC) was identified with Ori-Finder software [53]. 
Protein-coding genes were predicted with Glimmer 3.02 
[54] using the default settings and a cutoff at 90 nt. Anno- 
tation of these genes was performed by homology searches 
in the NCBI nonredundant protein database with 80% 
overlap (E_value<le-10), in the cluster of orthologous 
groups (COG) database [55], the Inter Pro member 
(InterProScan) databases [56] and the Kyoto encyclopedia 
of genes and genomes (KEGG) pathway database [57], re- 
spectively. The tRNA genes and rRNA genes were identi- 
fied using the tRNAScan-SE tool [58], and RNAmmer 1.2 
[59], respectively. Finally, genome annotation and the 
structure of the predicted genes were manually refined. 

Comparative genomic analysis 

Sequences and protein coding sequences for each 
strain (MGCS10565: CPOOl 129.1; 4047:FM204883.1; 
H70:FM204884.1) were retrieved from NCBI (http:// 
www.ncbi.nlm.nih.gov). The genomic co-linearity of 
four genome sequences was generated using the MUM- 
mer 3 package [60]. Orthologous proteins were identi- 
fied with Inparanoid and MultiParanoid [61]. The 
CLUSTAL W software [62] and MEGA4 software [63] 
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were used to align the concatenated sequences from all 
orthologs and to construct phylogenetic trees. The Artemis 
Comparison Tool (ACT) [64] was used to view the overall 
comparison of S. zooepidemicus strain ATCC35246 and 
S, zoopidemicus MGCS 10565, S, zooepidemicus H70 
and S. equi 4047. 

SOLID RNA-seq library construction, sequencing and gene 
expression analysis 

The standard protocol from SOLiDTM Small RNA Ex- 
pression Kit (ABI) was used to construct the RNA-Seq 
library and sequencing was performed on an ABI SOLID 
4.0 sequencer. Reads with a quality value greater than 8 
were mapped to the S, zooepidemicus strain ATCC35246 
genome using the SOLID™ System Analysis Pipeline Tool 
,allowing mismatches up to five bases. The detailed map- 
ping methods have been described previously [63] and 
rRNA reads were filtered before mapping. The expres- 
sion level of genes was calculated by read counts nor- 
malized with the total mapped reads and the gene length 
was calculated using the RPKM method [65]. The dif- 
ferential expressions of genes between the in vitro and 
in vivo libraries were analyzed based on the DEGseq 
modeling methods [66]. 

Identification of pathogenicity Islands (PAIs) 

The PAIs of S, zooepidemicus strain ATCC35246 were 
identified according to the following criteria: First, GC 
content and GC skew were determined and regions 
showing differences from the average of the whole gen- 
ome indicated potential PAI loci. Second, the PAI locus 
was present in ATCC35246, but was absent or scattered 
in the other three species. Third, mobility genes, such as 
integrases, transposases, IS elements were present at the 
boundaries of the locus. Four, virulence genes were lo- 
cated in the locus. Finally, these loci were confirmed 
using IslandViewer, an genomic island predictor that in- 
tegrates three methods: IslandPick, IslandPath-DIMOB, 
and SIGI-HMM [67]. 

Construction of partial SeseClsland_3 knockout strain, 
Alsland3-Sz35246 

To construct ApSET4s-LR plasmid, the upstream (LA) 
and downstream (RA) fragments of the Sz35246 target 
region were amplified. These two fragments were linked 
by fusion PCR and inserted into the pSET4s plasmid. 
Competent Sz35246 cells were subjected to electro- 
transformation with ApSET4s-LR plasmid and positively 
transformed cells were selected at 28°C in the presence 
of spectinomycin. Bacteria at the mid logarithmic 
growth phase were diluted with THB containing spec- 
tinomycin and cultured at 28°C to the early logarithmic 
phase. The culture was then shifted to 37°C and incu- 
bated for 4 h. Subsequently, the cells were spread on 



THB and incubated at 28°C. Temperature resistant col- 
onies were screened at 37°C for the loss of vector- 
mediated spectinomycin resistance. The putative double 
crossover homologous recombinant mutants and some 
of the deleted genes in SeseCisland_3 were detected by 
PCR. 



In vivo challenges of ICR mice 

The Laboratory Animal Monitoring Committee of Jiangsu 
Province approved the experimental protocols. Two groups 
of eight- week-old ICR mice (10 animals per group) were 
used for in vivo infection studies. The wild-type Sz35246 
and AIsland3-Sz35246 were cultured with THB medium 
(Difco) at 37°C, with shaking at 180 rpm, separately. When 
the ODeoo reached 0.6, bacteria were pelleted, resuspended 
in PBS and diluted appropriately to 1.25 x 10^ CFU/ml 
(5xLD5o per 0.2 ml, LD5o=5xlO^ CFU/ml) [68]. Mice were 
injected with 0.2 ml of liquid bacterial suspension. Survival 
was monitored for 5 days. Survival curves and statistical 
analysis were made by GraphPad Prism (Version 5.02). 

Additional files 



Additional file 1: Table 51. Genome information for 5. zoopedemicus 
ATCC35246. 

Additional file 2: Table 52. Comparative gene expression analysis 
in vitro and in vivo for S. zoopedemicus ATCC35246. 

Additional file 3: Table 53. 5. zoopedemicus ATCC35246-specific genes. 

Additional file 4: Table 54. Genes in SeseCisland_l of the 
S. zoopedemicus ATCC35246 genome. 

Additional file 5: Table 55. Genes in SeseCisland_2 of the 
5. zoopedemicus ATCC35246 genome. 

Additional file 6: Table 56. Genes in SeseCisland_3 of 5. zoopedemicus 
ATCC35246 genome. 

Additional file 7: Table 57. Genes in SeseCisland_4 of the 
5. zoopedemicus ATCC35246 genome. 

Additional file 8: Table 58. PCR detection of genes located in 
SeseCisland_3 deleted region. Red: Genes still existed in Sz35246 
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